What's in a name

Graham Rhind

When you live in a country such as the United Kingdom or the United States of America, you’ll know that there are certain names which are common – John Smith, Steve Jones, Mark Brown. But we also know that they are not THAT common – the chances of two people called John Smith living at the same address and not being related to each other is negligible. For Names’ Sake

This is not the case for other countries, and this is often caused by cultural but also by legal differences. In countries such as the USA people are free to name themselves or their offspring in any way they wish, as people like Beezow Doo-Doo Zopittybop-bop-bop can attest. In other countries laws exist which reduce this freedom to a greater or lesser extent. In New Zealand, for example, personal names must be approved by the authorities. Names containing numerals are not allowed, as is the case with any names likely to cause problems for the child. In Germany, until recently, given names had to be “by nature” given names (i.e. could not include family names, common nouns and so on); they had to be gender-specific and they still may not have the potential to cause harm to the namee (e.g. Mickey Mouse, Cain, Osama bin Laden). In Morocco, a list exists from which parents are obliged to choose the given name(s) of their children.

As a result, the variety of names differs widely in different cultures.

A Data Quality Minefield

In England and Wales 5.7% of people share the most common 10 surnames. France has over 900000 different surnames. Belgium has the greatest number measured per inhabitant, and Italy has the best spread of names, with only 0.67% of the population sharing the top 10 surnames. Comparatively fewer surnames exist in Denmark, Spain and Sweden where the top 10 names are shared by 25.93%, 19.65% and 19.5% of the population respectively. Bring 100 random Danes together in a room and throw a rock, and you have a 1 in 4 chance of hitting a Jensen, Nielsen, Hansen, Pedersen, Andersen, Christensen, Larsen, Sørensen, Rasmussen or Jørgensen!

This pales into insignificance when looking at many Asian cultures, though. In Vietnam the top 10 names are shared by 82.9% of the population. In China the top 100 names are shared by 87% of the population, so that parents are very inventive with given names to try to provide a unique moniker for their offspring; and in Korea the entire population shares no more than 300 surnames, with the majority sharing just three: Kim, Pak and Yi.

Clearly, when using names in any processes, this variety – or lack of it – needs to be taken into account. When you’re used to the wide range of surnames many of us share in Western Europe, the more homogenous names used in other societies can provide quite a headache for data management!

About The Author Graham Rhind

Graham Rhind is an acknowledged expert in the field of data quality. He runs his own consultancy company, GRC Database Information, based in The Netherlands, where he researches postal code and addressing systems, collates international data, runs a busy postal link website and writes data management software. Graham speaks regularly on the subject and is the author of four books on the topic of international data management. You can find him on Twitter via @grahamrhind.

Practical International Data Management Online. A free resource from GRC Data Intelligence. For comments, questions or feedback: pidm@grcdi.nl